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Abstract: We arc interested in the statistical linear inverse problem Y = 
Af + eg, where A denotes a compact operator and eg a stochastic noise. In 
a first time, we investigate the link between some threshold estimators and 
the risk hull point of view introduced in (5). The penalized blockwise Stein's 
rule plays a central role in this study. In particular, this estimator may be 
considered as a risk hull minimization method, provided the penalty is well- 
chosen. Using this perspective, we study the properties of the threshold and 
propose an admissible range for the penalty leading to accurate results. We 
eventually propose a penalty close to the lower bound of this range. The 
^^J • risk hull point of view provides interesting tools for the construction of 

adaptive estimators. It sheds light on the processes governing the behavior 
of linear estimators. The variability of the problem may be indeed quite 
jrt , large and should be carefully controlled. 
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1. Introduction 

This paper deals with the statistical inverse problem: 

Y = Af + et (1.1) 

where H, K are Hilbert spaces and A : H ^ K denotes a linear operator. The 
function / € iJ is unknown and has to be recovered from a measure of Af 
corrupted by some stochastic noise e^. Here, e represents a positive noise level 

1 
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and ^ a Gaussian white noise (see (14) for more details). In particular, for all 
g G K, we ean observe: 

{Y,g)^{Af,g)+e{^,g), (1.2) 

where (^, g) ~ A/'(0, ||ff|P). Denote by A* the adjoint operator oi A. In the sequel, 
A is supposed to be a compact operator. Such a restriction is rather interesting 
from a mathematical point of view. The operator {A*A)~-^ is unbounded: the 
least square solution f^s = {A*A)~^A*Y does not continuously depend on Y. 
The problem is said to be ill-posed. 

Several studies of ill-posed inverse problems in a statistical context were pro- 
posed in recent years. It would be however impossible to cite them all. For the 
interested reader, we may mention (12) and (11) for convolution operators, (15) 
for the positron emission tomography problem, (9) in a wavelett setting, or (2) 
for a general statistical approach and some rates of convergence. We refer also 
to (10) for a survey in a numerical setting. 

Using a specific representation (i.e. particular choices for g in (1.2)) may help 
the understanding of the model (1.1). In this sense, the classical singular value 
decomposition (SVD) is a very useful tool. Since A* A is compact and auto- 
adjoint, the associated sequence of eigenvalues (6^)/cgN is strictly positive and 
converges to as fc — > -|-oo. The sequence of eigenvectors {4>k)ket>i is supposed 
in the sequel to be an orthonormal basis of H. For all A: £ N, set ipk = b^^Acj)^. 
The triple {bk,4>k,i'k)k<£N verifies: 



A*i)k = bk(l)k, 

for all k £ N. This is the singular value decomposition oiA*A. This representa- 
tion leads to a simpler understanding of the model (1.1). Indeed, for all fc e N, 
using (1.3) and the properties of the Gaussian white noise: 

yk = (y, i'k) = {Af, i^k) + e(?, i^k) = bkif, <t>k) + e^, (1.4) 

where the ^fe are i.i.d. standard Gaussian variables. Hence, for all fc € N, we can 
obtain from (1.1) an observation on 6k — {f,(j)k)- In the €^-sense, 6 = {9k)k&n 
and / represent the same mathematical object. The sequence space model (1.4) 
clarifies the effect of A on the signal /. Since A is compact, 6fc — *■ as A: — *■ -l-oo. 
For large values of k, the coefficients bkOk are negligible compared to e^fe. In a 
certain sense, the signal is smoothed by the operator. The recovering becomes 
difficult in the presence of noise for large 'frequencies', i.e. when k is large. 

From now, our aim is to estimate the sequence {0k)kGN- The linear estimation 
plays an important role in the inverse problem framework and is a starting point 
for several recovering methods. Let {Xk)kGN be a real sequence with values in 
[0, 1]. In the following, this sequence will be called a filter. The associated linear 
estimator is defined by: 

/a = ^Afc6^Vfc(^fc. 

k=l 
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In the sequel, f\ may be sometimes identified with 6x ~ {^kb^ J//c)feeN- The 
meaning wiU be clear following the context. The error related to fx is measured 
through the quadratic risk: 

+OC +00 

R{0,x) ^Eewh- ir ^T.(^- >^km+e^Y.^ibk' ^Me>.-or. (1.5) 

fc=i fe=i 

The most interesting filters are data-driven in the sense that they do not require 
a priori informations on /. In order to evaluate the performances of a data- 
driven estimator, we use in this paper the oracle point of view. Given a family 
of estimators T, define: 

Ot = arginf Eell^-e'f. 

This is the oracle for the family T: it is the best possible estimator of 9 in the 
family T. A data-driven estimator 9* may be compared to 6t via the oracle 
inequality: 

Eg\\9* -9f<{l + ^,)Eg\\9T -9f+ Ce^ (1.6) 

with i9e, C > 0. The quantity Ce^ is a residual term. The inequality (1.6) is said 
to be sharp if ??e ^- as e -^ 0: 9* asymptotically mimics the behaviour of 9t- 
Oracle inequalities play an important, though recent role in statistics. They pro- 
vide a precise measure on the performances of 9* . They do not require a priori 
informations on the signal and are non-asymptotic. In several situations, oracle 
results may also lead to interesting minimax rates of convergence. This theory 
has given rise to a considerable amount of literature. We mention in particular 
(9), (1), (G) or (3) for a survey. 

There exist several approaches leading to accurate oracle inequalities. The un- 
biased risk estimation (URE) method presents an interesting behavior. However, 
it does not take into account the variability of the problem. This is quite prob- 
lematic in the inverse problem framework. The risk hull minimization (RHM) 
method initiated in (-5) is an interesting alternative. This method proposes a 
data-driven bandwidth for projection schemes in the SVD setting. The principle 
is to identify the stochastic processes that control the behavior of a projection 
estimator. Then, a deterministic criterion, called a hull, is constructed in order 
to contain these processes. In several cases, this approach leads to an accurate 
recovering. 

Our aim in this paper is to obtain oracle inequalities on wide families of es- 
timators. In order to achieve this goal, we will consider the penalized blockwise 
Stein's rule initiated by (8). Some specific choices of penalty have already been 
proposed. Here, we present a general approach. The risk hull point of view may 
precise the role of the penalty. In particular, we present a link between the RHM 
procedure and such thresholds estimators. 

This paper is organized as follows. The construction of the penalized block- 
wise Stein's rule and some related results are recalled in Section 2. Section 3 
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presents a link between hulls and such threshold estimators. Section 4 investi- 
gates the performances of the penalized blockwise Stein's estimator following 
the chosen penalty. Section 5 proposes some examples and a discussion on the 
choice of the penalty. Some results on the theory of ordered processes and the 
proofs of the main results are eventually gathered in Section 6. 

2. The penalized blockwise Stein's rule 

The construction of adaptive estimators is an interesting problematic. In the 
oracle sense, an ideal goal of adaptation is to obtain a sharp oracle inequality 
over all possible estimators. This is in most cases an unreachable task since this 
set is rather large. The difficulty of the oracle adaptation increases with the 
size of the considered family. In this paper, we restrict ourselves to the linear 
and monotone estimators. In the following, this family will be identified to the 
collection: 

A™o„ = {A = {\k)ken e ^' : 1 > Ai > • • • > A^ > • • • > 0} , 

of linear and monotone filters. This family contains most of the existing linear 
procedures. We may mention for instance the spectral cut-off, Tikhonov. Pinsker 
or the Landweber filters (see for instance (10) or (2)). 

A good way to obtain oracle inequalities on Kmon is to consider in a first time 
the family of blockwise constant filters A"^ defined by: 

A* = {A e ;2 : < Afe < 1, A, - A^- , VA: G [X^, X,+i - 1], 

j=0,...,J,Afe = 0, fc>iV}, (2.1) 

where J, N and {Kj)j=o,,,j are such that Kq = 1, Kj = iV + 1 and Kj > Kj-i- 
In the following, we will also use the notations Ij = {k G [Kj^i,Kj — 1]} and 
Tj = Kj - Kj_i, for all j e {1, . . . , J}. 

The family A* can easily be handled. In particular, each block Ij can be 
studied independently of the other ones. This simplifies considerably the study 
of the considered estimators. Moreover, for all 9 G £^, 

RiO,Xmon)^ inf i?(6l, A) and i?(6l,A°)= inf i?(6i. A), (2.2) 

AeAmon AeA* 

are in fact rather close, subject to some reasonnable constraints on the sequences 
(&fc)feeN and {Tj)j=i,,,j (see Section 5 or (s) for more details). 

In order to construct a data-driven filter on A*, one may consider the well- 
known unbiased risk estimation approach (URE). The principle is rather intu- 
itive. We want to construct a filter as close as possible to the oracle on A*. The 
quadratic risk R{9, A) associated to each filter is unknown: it explicitly depends 
on the function /, i.e. the sequence 9. This term can however be approximated 
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by an estimator U{y^ A). The related adaptive filter is then defined as: 

A^^^ = arg inf Uly.X), 
AeA* 

"+00 +00 



are inf 
AeA* 



.k=l k—1 

and can explicitly be computed: 



(2.3) 



A^«^ = <( 1^^ W^) ' kelj, J-1...J, 
, k> N, 



with. 



112^11 0) = ^' E bfyl and a] =e^Y. K\ V, e {1 . . . J}. (2.4) 

fce/j fce/j 

For all j G {1, . . . , J}, a"^^ is in fact compared to ||y||? n- Recall that Ee||y||? -^ = 
SfeG/ ^fc + '''i ■ I'^ a certain sense, A^^'^ only keeps the blocks where the signal 
is not negligible compared to (Py However, A^^-^ is rather unstable, due to the 
ill-posedness of the problem. It is only concerned with the average behavior of 
the loss, or equivalently of ||y||? y The URE approach does not take into ac- 
count the variability of the problem: the variance of ||y||?.i explodes for large j. 
A complete discussion illustrated by some numerical simulations is provided in 
(5) in a slightly different setting. 

As a benchmark for the URE method limitations, one may consider penalized 
estimators. Let (pen )j=i...j a real positive sequence. For all A G A*, consider 
the penalized unbiased risk estimator of the quadratic risk: 

,/ 
Vvhi. A) - E \^^% - 2AKj(|ly|l^,) - ^]) + A|,a| + 2AA',pen^. 
i=i 

J 
= t/(z/,A) + 2^AA^pen^-, (2.5) 

i=i 

where t/(j/. A) is defined in (2.3). The penalty should ideally contain the variabil- 
ity of the problem. An adaptive estimator may be constructed as the minimizer 
of C/p(y, A) on the family A*. The solution is: 

a;: = <; v^-liiiT")^ ' ^-e/,, j = i...j, (2.6) 



, k>N. 



Cavalier and Tsybakov (2002) proposed to choose pen- ~ Vjff for all j G 
{!,..., J}. Concerning the sequence {^j)j=i...j, they set the following condi- 
tion: 
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Assumption Al: There exists a constant Ci independent of e such that. 
J 



Emax &. exp 
fee/, " 






^? 



for all i e {I,..., J}. 



9 7 —2 

< Ci, mi/i Aj = ^ ^, (2.7) 



The performances of the related estimator 6 are summarized in the following 
theorem: 

Theorem 1. (Cavalier and Tsybakov (2002)). 

Let 9 the estimator associated to the filter A* defined in (2.6) with the penalty 
pen = V'jO'? for all j G N. Assume that the sequence (ifj)j^i,,,j satisfies As- 
sumption Al and fj<l^ 4Aj, for aH 7 G {1, . . . , J}. Then, for all 6 G i'^ and 
< e < 1, we have: 

Eelir - ef < (1 + ipe) inf R{e, A) + SCie^, 

AsA* 

with Lpt ~ maxi<j<j(2(/9j + 16Aj/(y9j). 

For a large range of inverse problems, Theorem 1 provides a sharp oracle in- 
equality on A* for A* with pen^ = </5jcr| for all j G {1 ... J}. Indeed, assume for 
instance that the sequence (&fe)feeN possesses a polynomial decay. In this case, 
with an appropriate choice of blocks, Aj ^ as e ^ 0. It remains to choose tpj 
such that <pe ^ as e — > 0. 

In this paper, we investigate the relationship between the properties of the 
penalty (pen)j=i...j and the performances of the related estimator. In partic- 
ular, we shed some light on the link between such threshold estimators and the 
risk hull point of view introduced in (5). The underneath aim of this study is 
to precise the role of the penalty and provide an admissible range in the oracle 
sense. 



3. Risk hull and penalties 

The principle of risk hull minimization method has been introduced in (5). 
It provides an adaptive bandwidth choice for the projection (also called spec- 
tral cut-off) filters: (l{fc<Ar})fcGN with iV S N. The projection estimation in 
the SVD formalism (1.4) may be seen as a toy model for adaptive estimation. 
Nevertheless, it presents several difficulties and requires a careful treatment. In 
this section, we recall the principle of the risk hull minimization for projection 
schemes. Then, we prove that the penalized quadratic risk is a hull for the family 
of blockwise constant filters. This result requires some conditions on the penalty 
(pen ■)j=i...,/: the role of this sequence is then precised. 
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For all A^ G N, denote by 6'jv the projection estimator associated to the filter 
(l{fe<jv})feeN- For each value of iV G N, the related quadratic risk is: 

N N 

fe=l k>N k>N fc=l 

The optimal choice for N is the oracle iV" that minimizes Eg 1 1 6'Ar — 6* 1 1 ^ . It is a 
trade-off between the two sums (bias and variance) in the r.h.s. of (3.1). 

In order to construct an adaptive bandwidth N*, one may use the classical 
URE procedure. This approach has been studied in this setting by (fi). The the- 
oretical results are interesting but the numerical simulations may be somewhat 
disapointing in several cases. The URE approach does not take into account the 
variability of the problem, which may be quite large when considering compact 
operators. 

The construction of an adaptive filter can be decomposed in two steps. First: 
evaluate the error associated to each filter by constructing an appropriate cri- 
terion. Then, use this criterion in order to propose an adaptive estimator. The 
criterion associated to the URE approach is the quadratic risk. It corresponds 
to the average behavior of the considered estimators and does not detect the 
variability. (5) are interested instead in the loss: 

N 
k>N k=l 

As a criterion, they use a deterministic term V{6,N), called a hull, satisfying: 

Eesup[l{e,N)^V{9,N)]<0. (3.2) 

NeN 

This hull bounds uniformly the loss in the sense of inequality (3.2). It is con- 
structed in order to contain the variability of the projection estimators. 

The risk hull point of view makes the role of the stochastic processes involved 
in linear estimation more precise. In the same way, it quantifies the limitations 
of the URE method and may lead to an accurate understanding of the prob- 
lem. The hull proposed by (o) precisely take into account the variability of the 
problem. The related adaptive estimator possesses interesting theoretical and 
numerical properties. In the same spirit, we mention (17) for general families of 
linear estimators. 

The risk hull point of view may provide an interesting perspective on the 
blockwise constant adaptive approach. In this section, we propose sufficient 
conditions on the penalty making the penalized quadratic risk a hull. 

First, we introduce some notations. For all j G {1, . . . , J}, let tjj defined by: 

v,^^'J2''k'iek~i)- (3.3) 

keij 
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The random variable rjj plays a central role in bloekwise constant estimation. 
It corresponds to the main stochastic part of the loss in each block Ij . The hull 
proposed in Theorem 2 bellow is constructed in order to contain these terms. 
Introduce also: 

= max ^Aj and \\e\\l) = ^ Ol Vj G {1, . . . , J}, (3.4) 

7 — 1... J , ^ 



p. 

1=1.. .J ' 

keij 

where A^ is defined in (2.7). We will see that pe ^ as e -^ with appropriate 
choices of blocks and minor assumptions on the sequence {bk)kGN (see Section 
5 for more details). Concerning the penalty, we set the following condition: 

Assumption A2: There exists a constant C2 independent of e such that: 

J 

Y,^[rio~W^i\+<C2t\ (3.5) 

Assumptions Al and A2 are in fact rather close. Indeed, the exponential term 
in Al corresponds to the probability of the event {A^. > 0} when the signal to 
noise ratio is small on the block j. In such a situation, Ijyll?-, is close to a^ +ilj- 
Recall that for all j e {1 ... J}, 



At = 1 



cr| + peuj 



^K, - 1 ^ IU-,I|2 ^ j 



\y\\u) 



If the penalty is 'well-chosen', P{Xj > 0) is small, i.e. the penalty controls in a 
certain sense the variables rjj. This is exactly the principle of Assumption A2. 
In Section 4, the link between these two hypotheses will be strengthened via an 
upper bound of the l.h.s. of (3.5). 

The proof of the following result is presented in Section 6. 

Theorem 2. Assume that Assumption A2 holds. Then, there exists B > such 
that: 

V{e, A) = (1 + Bp,) I E [(1 - ^K^fPfu) + ^%^) + 2AK,pen 

+C2e'+Bpe-R(0,A°), 
is a risk hull on A* , i.e.: 

Eg sup \\\§x-9\\^ -V{e,X)\ <0. 
aga* '- J 

Theorem 2 states in fact that the penalized quadratic risk: 

,/ 
Rp{e,\) = E [(1 - ^K,nO\\l) + \\a] + 2AK,pcn^-] + ^ ^l^ (3-7) 

j = l k>N 
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is, up to some constants and residual terms, a risk hull on the family A*. Hence, 
Rp{9, A) may be a good criterion for the choice of A*, provided that inequality 
(3.5) is satisfied. In a certain sense. Theorem 2 justifies the approach presented 
in Section 2 since the term Up{y, A) defined in (2.5) is an estimator of Rp{9, A). 
The Theorem 2 makes the role of the penalty more precise. In order to obtain 
a hull, we need to construct a penalty that contains, in the sense of inequality 
(3.5), the variables (?7j)j=i...j. 

This result is established for general blocks. Following (7), there exist several 
choices that may lead to interesting results. An example is presented in Section 5. 

The penalty {penj)j=i,,,j in the hull V{9, A) is associated in each block Ij to 

the term 2Xj. The construction is then closely related to the problem of proving 

that: 

J 

VEe sup {A27/,-2Ajpen^-} <C2e', 

^•^1 A,e[o.i] 

see the proof of Theorem 2 in Section 6. The penalty may also be associated to 
the term Xj. In this case, we obtain the following result. 

Theorem 3. Assume that Assumption A2 holds. Then, there exists B > such 
that: 



W{9,X) = (l + Bp,)J^[(l-Ax,)'!|e||?,)+A2,^.a2 + A2,^pen^.] + 5^02 

= 1 k>N 

+C2e^ + Bp,R{9,X°), (3.8) 



is a risk hull on A* , i.e. 

Eg sup \\\9x-9f -W{9,X)] <0. 
AeA* '- J 

Both hulls V{9, A) and W{9, A) contain the loss in the same way, i.e. the 
residual terms and the constant B are exactly the same. However, W{9, A) is 
slighty smaller than V{9,X) since < A^ < 1 for all j G {1...J}. In some 
sense, W{9, A) is more precise. Nevertheless, we will use V{9, A) as a criterion 
in the following: the associated estimator has a simpler form. We will also see 
in Section 4 that the variability of the estimator of V{9, A) is contained by the 
penalty. This is not the case for W{9, A). 

The proof of Theorem 3 is presented in Section 6. It follows essentially the 
same lines of the proof of Theorem 2. 

4. Oracle inequalities 

In Section 3, we have proposed a family of hull indexed by the penalty 
(pen^)j=i...j. A hull may be a good criterion in order to evaluate the perfor- 
mances of the estimators contained in A* since it takes into account the vari- 
ability of the problem. In this section, we are interested in the performances of 
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the estimators constructed from these hulls. We are looking for conditions on 
the penalty (pen )j=i...,7 that may lead to sharp oracle inequalities. 

In the sequel, we set \j = \k. for all j £ {1 ... J}. This is a slight abuse of 
notation but the meaning will be clear following the context. Then define: 

,/ 
U^iv. A) - ^ [(A2 - 2A,)(||y|l2^.) - a]) + ^a] + 2A,pen^. 
i=i 

The term Up{y^ A) is an estimator of the penalized quadratic risk Rp{9, A) defined 
in (3.7). Recall that from Theorem 2, this term is, up to some constant and 
residual terms, a risk hull. Then denotes by 6* the estimator associated to the 
fiher: 

A* = arg mill Up{y, A), (4.1) 

The solution of (4.1) is the penalized blockwise Stein's estimator introduced in 
(2.6). 

Theorem 4. Assume that Assumption A2 holds. Then, there exists C* > 
independent of e such that, for all 6 £ P and any < e < 1 .' 

EJir -ef <{1 + T,) inf R{e, A) + C*e\ 
AeA* 

where t^ —^ Q as e ~^ provided maxj pen /cr^ -^ and p,, -^ as e ^ 0. 

In some sense, this result generalizes Theorem 1. However, we construct a 
different proof through the risk hull point of view (see Section 6). 

Theorem 4 provides in fact an admissible range for the penalty. If we want 
a sharp oracle inequality, necessarily maxj pen /cr^ ^ as e ^ 0. Hence, the 
penalty should not be too large. In the same time, we require from Assumption 
A2 that the penalty contains in a certain sense the variables {rij)j=i,,,j. The 
following lemma provides upper bounds on the term Eie[r]j — pen ]-|_ and makes 
more explicit the behavior of the penalty. 

Lemma 1. For all j G {1, . . . , J} and S such that < S < €^^h\._^/2: 
Efe - pcn^]+ < 6-^ exp J -5pen^. + 5^Y.] + 45^ ^ ., '' V2 ^ 

^ = e^ E K'- (4-2) 

fee/,- 



with 
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PROOF. Let j e {1, . . . , J} be fixed. First remark that for all 6 > 0: 
= / PiV3 > t)dt, 

•^pcn 

I' + OO 

= / P{exp{Sr]j) > e^*)dt, 

< 5-ie-*P™^Eeexp((5r/j). 
Then, provided Q<6 < e-%\^_j2: 

Ee exp(,5r;,) < exp J 5^^ + A5^ ^ ^1/2^51^), 
This conclude the proof of Lemma 1 . 

n 



Let u and v two real sequences. Here and in the sequel, for all fc G N, we 
write Ufc < Vk if we can find a positive constant C independent of k such that 
Uk < Cvk-, and u^ ~ Vk if both Uk < I'k and Uk>Vk- 

From Assumption A2 and Lemma 1, it is possible to prove that (pen)j=i...j 
should at least fulfill pen > Sj for all j G {1, . . . , J}. Since we require in the 
same time maxj cr^/pen ^ as j ^ +00, an admissible penalty in the sense 
of Theorem 3 should satisfy: 

S,<pcnj<a2, VjG{l,...,J}. (4.3) 

In Section 5, we present a discussion on the choice of the penalty. A specific as- 
sumption on (&fc)fcgN make the situation easier to handle. We present examples 
where pen /Ej — > +00 as j — > +cx3 and e -^- 0. In particular, we will see that 
the penalty pen — ipjcr'j with ipj — A^ and < 7 < 1/2 is admissible following 
(4.3). 

We are now able to propose an oracle inequality on Amon, the family of 
monotone filters. 

Corollary 1. Assume that Assumption A2 holds and 

max ^^ < 1 + ?7e, for {)< m < 1/2. (4.4) 

j=l...J-l cr| 

Let r > be fixed and N > max{?7i : X^I'Li ^k — '"^e~^'7e~^}- Then , for any 
9 £ P such that \\6\\ < r, we have: 

Eg\\e*-e\\^ <{i + r,) inf R{e,x) + C3e^, 
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where C3 denotes a positive constant independent ofe, T^^ — (21]^ + t^) / {1 ~ 2rj^) 
and Tf is introduced in Theorem 4- 

The proof is a direct consequence of Lemma 1 of (8). 

5. The choice of the penalty 

Following Theorems 2 and 4, the penalized blockwise Stein's estimator is 
derived from the risk hull method. In the previous section, we have presented 
conditions on the penalty (pen )_, that lead to sharp oracle inequalites. In this 
context, the following question naturally arises: what is the smaller possible hull, 
i.e. may the criterion constructed in (S) be refined? 

The Stein hull V{9, A) being, up to some constants, the sum of the quadratic 
risk and a penalty, the previous question may be transfered to the penalty: how 
can we choose this quantity? This question is quite difficult. The answer cannot 
be summarized in a single paper. This section only try to shed some light on 
the properties of the penalty. 

Concerning the sequence {bk)keN-, we set the following condition: 

Assumption Bl: There exists f3 > such that {bk)keN — {k~^)kefi, i-e. there 
exist b and b independent of k such that bk~^ 1^ bk < bk"^ , for all k £ N. 

The eigenvalues are supposed to be polynomially decreasing. The problem is 
said to be mildly ill-posed. 

Assume that Bl holds. Then, for all 7 e {1 ... J}: 



and 






provided Kj+i/Kj ^ 1 as j -^ +00. We deduced from Theorem 2 in Section 3 
an admissible range (in the oracle sense) for the penalty: 

S, <pen^<af, Vj G {1, . . . , J}, 

which, following Assumptions A3 is equivalent to: 

e^xf (X,+i - K,y/-' < pen^. < e'K]UK,+, - K,), Vj' G {1, . . . , J}. 

Since, all the penalties in this admissible range may lead to a sharp oracle in- 
equality, one may be interested in finding the smaller possible one, or at least 
as close as possible to the lower bound of this range. 
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For all j e {1, . . . , J}, the penalty of (n) is: 

penf^ = ip,a] = Ajfrf ~ e^Kf{Kj+i - K,)^-'' , with < 7 < 1/2. 

For 7 = 1/2, Assumption Al does not hold. The penalized blockwise Stein's es- 
timator is not sharp following (1.6). Remark that this particular choice exactly 
corresponds to the lower bound of the range (4.3). 

The principle of risk hull minimization may lead to more accurate choices. 
The only restriction on (pen)^ from the risk hull point of view is expressed 
through Assumption A2: 

,7 

^Ee[ry, -penJ^<C2e^ 

for some positive constant C2. Since Eg [rjj — 2u\, < Kgrijl^^j-^^y for all positive 
u, we may be interested in the penalty: 

pen^- = (1 + a) inf {u : Eer/jl{,„^.>„} < e^} , Vj £ {1, . . . , J}. (5.1) 

for some positive a > 0. 

In the following, we prove that the sequence (pen)j=i...j defined in (5.1) 
leads to a sharp oracle inequality. In particular, Assumption A2 holds, i.e. the 
penalty contains the variability of the problem. For the sake of convenience, we 
restrict ourselves to one specific type of blocks. All the results presented in the 
sequel hold for other constructions (see for instance (7)). We leave the proof to 
the interested reader. 

Let J^£ = [loge^^] and n^ = log^^ i^e, where for all a; € R, \x] denotes the 
minimal integer strictly greater than x. Define the sequence (Tj)j^ij by: 






(5.2) 



and the bandwidth J as: 

J = min{j : Kj > N}, with N == max < to : ^ 6^^ < e^^K^^ i . (5.3) 

Assumption B2: The length of the blocks is defined by the sequence (Tj)j=i...j 
where J satisfies (5.3) and the terms Tj are defined in (5.2). For all j G 
{1, . . . , J}, the penalty is peTT~ = (1 + a)Uj, for some a > 0. 

In practice, the penalty (5.1) can be computed using Monte-Carlo approxima- 
tions. We may also use the following lower bound: 
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Lemma 2. Assume that Assumptions Bl and B2 hold. Then there exists a 
constant C independent of j and e such that: 



Uj = inf [u : Ee77jl{„,>„} < e'} > ^J^Y.] log {Ce~^Y.]), (5.4) 

where for all j £ {1 ... J}, E^ is defined in (4-2). 

The proof can be directly derived from the Lemma 1 of (■')). Now, if Assump- 
tion A3 holds: 



pen,. > Kf{K,+, - K.f'^ ^\og{CY?^), Vj G {1, . . . , J}. 

This penalty reaches, up to a log term, the lower bound of the range (4.3). As 
we will see the related hull V{9, A) is close to the smaller possible one, provided 
Assumption Bl holds. 

Next corollary establishes that the sequence ((1 + a)Uj) ^^ j, where the Uj 
are defined in (5.4), is a relevant choice for the penalty. 

Corollary 2. Let 6* the estimator introduced in (2.6). Assume that Assumption 
Bl and B2 hold. Then, 

Eelir - 0f < (1 + 7e) inf R{e, A) + ^e^, 
agA* a 

where C4 denotes a positive constant independent of e and 7^ ~ o(l) as e ^ 0. 

PROOF. From Theorems 2 and 4, we only have to prove that Assumption 
A2 holds since max^ pen- /crl converges to as e ^ 0. For all j e {1, . . . , J}, 
using Lemma 1 in Section 4: 



[r^j - penj ^ < ^ cxp <^ -Jpen^- + 5^^ + '^^^ Y^ TT 



E [r;, - pen,]^ < 7 ^xp <( ~6pm^ + S'^', + AS' Y^ --266%],' ' ^ ' ^^'^^ 



for all < (5 < e-%ji._j2. Setting 



^^ /l0g(C6-4S2) 



2^ 



and using Lemma 2, we obtain: 

E[tJj -pcnj] + 



< 



21] 



log(Ce-4s2) 



4V2A °^P {^ log(Ce-4S]2)| ^ ^^p |_(^ ^ ^) logiCe-^^)} 



- ^^V log(c'-^^-) ^^^^'"^"^^^^"^-^^^- 
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Indeed, provided (5.2), (5.3) and Assumption Bl hold, 5b^._^ and the last term 
in the right hand side of the exponential in (5.5) converge to as j — > +oo. 
Hence, we eventually obtain: 

,7 J 

^E[r,,-pcn,]+ < Ce^^-^^j— — exp{-alog(CT,)}, 



< Ce2^j-i/2exp{-alog(Cz.,(l + K,)^)} 

+ 00 „ 2 



i=i 



where D and C denote two positive constants independent of e. This concludes 
the proof of Corollary 2. 

n 



From Corollary 2, the penalty (pen)j=i_...^j leads to a sharp oracle inequal- 
ity. Using the same bounds as in the proof, it seems hopeless to obtain a similar 
result with the penalty (I]j)j=i...j. Remark that with our assumptions, this 
choice exactly corresponds to the penalty of (s) with 7 = 1/2. From (5.5), 
the Assumption A2 will not be satisfied. However, the bound proposed by the 
Lemma 1 may be perhaps improved or this Assumption A2 relaxed. 

In a certain sense, the risk hull point of view quantifies the effects of the 
penalty. We have presented an admissible range for this quantity and proposed 
a choice close to the lower bound of this range. From now, the question is: 'What 
is a good penalty?'. A small penalty leads to a sharp hull (see Theorem 2) but 
the constant C* in Theorem 3 may be large. On the opposite hand, large penal- 
ties perfectly contain the variability of the problem but the hull is less precise. 
Hence, it is not clear that such a problem may be easily solved. This was not 
the goal of the present paper. 

In order to conclude, it seems necessary to discuss about the role played by 
the constant a in the penalty (pen )j=i...j. Assumption A2 does not hold for 
a = 0. On the other hand, the proof of Corollary 2 indicates that large values 
for a will not lead to an accurate recovering. The choice of a has already been 
discussed and illustrated via some numerical simulations in a sligthy different 
setting: see (5) or (17) for more details. Remark that we do not require a to 
be greater than 1 in this paper. This is a small difference with the constraints 
expressed in a regularization parameter choice scheme. This can be explained 
by the blockwise structure of the variables {'r]j)j=i,,.j. 
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6. Proofs and technical lemmas 

6.1. Ordered processes 

Ordered processes were introduced in (i(i). The more recent paper (4) studies 
in detail these processes and provides very interesting tools. These stochastic 
objects may play an important role in adaptive estimation: sec in particular 
(13) or (17) for more details. 

The aim of this section is not to provide an exhaustive presentation of this 
theory but rather to introduce some definitions and useful properties. 

Definition 1. Let C{t), t > a separable random process with E^(t) ~ and 
finite variance S^(t). It is called ordered if for all t2 > ti > 0, 

S'(t2) > S2(ti), and E[C(t2) - ((^i)]' < ^^(tz) - J:\t,). (6.1) 

Let C a standard Gaussian random variable. The process t <—> t^t is the most 
simple example of ordered process. Wiener processes are also covered by Defi- 
nition 1. The family of ordered processes is in fact quite large. 

Assumption CI. There exists k > such that: 

(\ W i C(^l)-C(^2) 1 , .„„. 

(P{k) = sup E exp <^ K— ======= S < +00. (6.2) 

ti,t2 [ v/IE[C(ii)-C(i2)]2 J 

This assumption is not very restrictive. Several processes encountered in linear 
estimation satisfy this hypothesis. 

The proof of the following result can be found in (4). 

Lemma 3. Let C{t), t > an ordered process satisfying C(0) = and Assump- 
tion CL There exists a constant C ~ C{k) such that for all 7 > 0; 

Esup[C(i)-7S'(t)], <-• 
t>o +7 

This lemma is rather important in the theory of ordered processes and leads 
to several interesting results. In particular, the following corollary will be often 
used in the proofs. 

Corollary 3. Let (^{t), t > an ordered process satisfying ^(0) — and 
Assumption CI. Consider t measurable with respect to Q. Then, there exists 
C = C{k) > such that: 

EC(t) < Cy^EE2(£). 

PROOF. Let 7 > be fixed. Using Lemma 1, 

EC(t) = EC(t) - 7EE2 (<) + 7ES2 (i) , 

< Esup[C(i)-7S'(i)], +7ES2(i), 
t>o "^ 

< - + 7Ei;2(t) 

7 
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Choose 7 = \r&Y?{t) in order to conclude the proof. 



n 



6.2. Proofs of Theorems 2-4 

Proof of Theorem 2. First, remark that: 

Ee sup hex-ef-vie.x)] 



xeA* 



4-CX3 



+ CX3 



= Ee sup ^(l-A,)202^e2^A26-'C.'-2e5]A,(l-Afe)0fe6,Ti6 



AeA 



.fc=i 



fe=i 



fc=i 



-V{9,X)}, 



= Eg sup < y^ 



AeA* 



J=l 



(1 - A.ril^ll^,) + A^ ^ 62^-2^2 _ 2A,(1 - A,)X, 



fc>JV J 






(1 - A,)2||e||^,) + A2 J2 ^\'a + 2A,(A, - 1)X, 



with 



and 



fce/, 



A = arg sup J||^a 
AeA* '- 



k>N 



y(0,A)}, 



(6.3) 



X,=eJ20kb^'Ck, Vj €{!,..., J}. 

keij 

Let j £ {1, . . . , J} be fixed. Use the decomposition: 

Ee2Aj(Aj - l)Xj = KeXJXj + E^A^ - 2Xj)X^, 

= EgX^jXj + Eg{l - XjfXj = A] + A], (6.4) 

since EgX^ = 0. First consider Ah.. Let A° denotes the blockwise constant oracle 
on the block j. Using Corollary 3 in Section 6.1: 



A] = EgX^^Xj = Eg 



X] - {\y X, < c 



\ 



Eg 



\' - (^")'l E ''K'^l (6.5) 



fcG/i 
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where C > denotes a positive constant. Indeed, both processes C, : t i^ [t^ — 
{X°r)X^, t e [(A0)^1] and C : t ^ {f-^ - (A°)')^j, t € [{X°^)-';+^[ are 
ordered and satisfy Assumption CI. For all 7 > 0, using: 



A? - (A°) 



o^2 

'3 



<4 



(1-A,)^ + (1-A;;)^ (Aj + (Ap^) 



(6.6) 



and the Cauchy-Schwartz and Young inequalities: 



4 < C Eo [{I \,Y + (1 - )^Y\ (A2 + (AO)2)max626-2||0||2^.^, 



< CEfl 



7(1 - A,f II^IIL + 7-'A,A?a^ + C^{1 \y\ 



Kj) 



+C7-iA,(A°)V^+CjE,(l-A,-)2A2maxe26-2|,^|j2 (6.7) 



fce/j 



0)' 



for some positive constant C. The bound of the last term in the r.h.s. of (6.7) 
requires to be careful. In a first time, suppose that: 



2 ^ ^2 
lb) ^^J- 



In such a situation, for all 7 > 0: 



(6.8) 



Ee(l-A,)2A2max6Vll^lim < Jll^llmlE.A^ maxeV> 



fee/, 



(j)' 



fee/,- 



If (6.8) holds, then: 



2 



^fii^r 



(j) 



2 ' 

(i) 



2 
0') 



< 2 



< 711^11^,) +7-'A,EeA2a|. 



{(l-A°)^||e||^,) + (A°)^af}, (6.9) 



where A° is the oracle defined in (2.2). Indeed, 



A? 



ki) 



I a) 



, Vje{l,...,J}. 



Now, suppose: 
Then, for aU 7 > 0: 



2 \ ^2 
b) >'^j- 



(6.10) 



Ee(l-Aj)2A2maxe26-2j|6i| 



fee/j 



0) 



< 



maxeVKe(l-A,)2||e||?,V 



(j) ' 



< ^Eg{i-x,ne\\l^ + ^-^/^,al 



Using (6.10): 



^I + 11^11?,) 



|0||2 



< 



(j). 



2{(1-A«)^i|^l|^) + (A^)V|}. 
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Setting 7 = -v/Aj, we eventually obtain: 



A] < Cy^Ee 



(1-A,)^!I^II?,)+AM 



C^/A" 



(6.11) 
for some constant C > independent of e. The same bound occurs for the term 
Aj in (6.4). Hence, there exists B > independent of e such that: 



.g sup |||^A-0|p-n^,A)| 



< 



J 



(1 + Sp,)(l - A.fll^ll^^., + A2 ^ £^^^2^2 ^ 5^^^2^' 

+ ^ 0^ + Bp,i?(0, A") - EeV{e, A), 



k>N 



< Ee sup i ^ 



AeA* 



j=i 



~ 


~ 


(l + i3p.)(l- 


-A,)^ll'^ll?.)+A?>>Ve?+5/'eA,M 



+ ^02^Bp,ii'(0,A°)-F(0,A)l 

fe>Af J 

where p^ is defined in (3.4). Now, using (3.3) and (3.6), 

Es sup [\\§x-ef~vie,x)'j 



AgA* 



< Eg sup i ^ [A^yy, - 2Ajpenj.] - Cse' 



AeA 



i=i 



= VEe sup [X'^jijj - 2Xjpenj] ~ C2e'^ . 

j^i A,e[o,i] 

Let j e {1 ... J} be fixed. We are looking for Aj e [0, 1] that maximizes the 
quantity X^jTjj — 2Ajpen -. If rjj < 0, the function A i-^ X^rjj — 2Apen is concave 
and the maximum on [0, 1] is attained for A = 0. Now, if rjj > 0, the function 
A I— > X^rjj — 2Apen ■ is convex and the maximum on [0, 1] is attained in or in 
1. Therefore: 

sup {Aj^77j - 2Ajpenj} = [ly - 2penj] (6.12) 

A,e[o,i] 

Using Assumption A2, we eventually obtain: 



A6A* 



Eg sup [\\0x -ef- ¥{0, A)} < 5Z^« b ~ 2penJ^ - Cse', 

J 
< ^ Efl [77, - penj ^ - Cae" < 0. 



j=i 
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This concludes the proof of Theorem 2. 

n 



Proof of Theorem 3. Using the same algebra of the proof of Theorem 2, we 
obtain: 

J 
Ee sup {px - ef - W{eA)] <y,^e sup {X]i^, - A^pen^.} - Cze'. 

AGA* 1^ J ^ A,e[0,l] 

For all j G {1 . . . J}, the only difference with V{9, A) is contained in the bound 
of: 

sup {A^ryj — A^pen^} < [j-jj — penj] + . 

AjG[04] 

Using Assumption A2, we obtain Theorem 3. 

n 



Proof of Theorem 4. In the situation where Assumption A2 holds, Theorem 
3 provides that: 

Eelir - 6*112 < W{e, A*) = (1 + Bp,)Rp{d, A*) + Bp,R{9, X°) + C2e^, (6.13) 

where: 

i?,(0, A*) = E [(1 - \*)'ll^llw + (^l)'^l + (^P'P'^n.] + E ^l 

3 = 1 k>N 

and i? denotes a positive constant independent of e. Moreover, from (4.1), 
Upiy,X*)<Up{y,X), VA e A*. 

The proof of Theorem 4 is mainly based on this two equalities. First remark 
that: 



Up{y,X*)-RpiOA*) 

= E [ii^ir - 2A^}(||^||^) - a|) + (Ap^al + 2A;pen^ - (1 - X^^fml^ 

-(ApM - (^P^p^-.] - E ^- 



k>N 



Y: [{(Ap2 - 2A*}(||^||^^.) - af ) - (1 - Ap^ll^ll^^.) + {2A| 



i=i 



(AP^pen,] - ^ e; 



fe>iV 
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Hence, 



E 



{{x)f 2A*} ^ [el + 62&-^(42 - 1) + 2eb-\ueu) - (1 - A*)2||0||^,.) 



+{2A* - (A*f }pcn^-] - ^ 0L 

fc>Af 
.7 

^{(Ap2_2A^nfe+2X,-pcn^.)- 



2 



where 77^ and Xj are respectively defined in (3.3) and (6.3). Hence, from (4.1), 

J 
i?p(0,A*) ^ U,{y,X') + \\ef + Y,{2\)-{X)f]{f^,+2X,-^cn^), 

j=i 
,7 
< C/p(y, AP) + \\ef + ^{2A* - (A*)2}(,,^. + 2X, - pen^lfy.U) 

wliere 

AP = arg inf R„(9,X). 

AgA* 

and Rp{9,X) is defined in (3.7). Then, witli simple algebra: 
EeU,iy,Xn = E, ^ [{(A^^ - 2A,^}(||y||?,.) - a|) + (ApV| + 2A^pcr 

- Rp{9,XP)-\\e\\\ 

This leads to, 

,7 
EgRp{9, A*) < i?p((?, AP) + Ee ^{2A, - (Ap2}(^^. + 2X, - pen^-)- (6-15) 

We are now interested in the behavior of the right hand side of (6.15). First, 
using (6.4)-(6.11) in the proof of Theorem 2: 

Ee{2A, - {X*f}X, 

< Cp, {(1 - Ap^ll^ll^^.) + (ApV|} + Cp^Eo {(1 - X^?\\e\\l^^ + (A*)2a|} , 

for all j e {1, . . . , J}. Here, C and (7 denote positive constants independent of e. 
In particular, it is always possible to obtain C verifying Cp^ < 1 (see the proof 
of Theorem 2 for more details) . Hence: 

E9Rp{0,X*) 

,7 

< (1 + Cp,)Rp{e, XP) + CpARp{e, X*) + E^ ^{2A^^ - {X]Y}{t^, - pen^). 
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Then, from Assumption A2 and (6.12): 

J J 

Ee ^{2A* - {XJf}{vj - pcn^) = E^ Y,h - pen,]+ < Cae^. 

This leads to: 

EeRpiO, X*) < (1 + Cp,)Rp{9, AP) + Cp,Ee^p(0, A*) + C2e^ 
^ (1 - (7p,)E(,i?p(0, A*) < (1 + Cp,)Rp{9, XP) + C2t\ 

=> EgRpie, X*) < 7^iS^i?p(^, Xn + Ce\ (6.16) 

(1 - Cp,) 

Using (6.13) and (6.16): 

Eelir-ef < {l + Bp,)EgRp{e,X*) + C2e' + Bp,R{e,X'^), 
< (1 + Me)i?p(e, A'') + Ce2 + Spei?(e, A"), 

where p,^ = PeiPe) is such that p^ —^ as p^ —^ and C is a positive constant in- 
dependent of e. In order to conchide the proof, we just have to compare R{0, A*^) 
to Rp{9, Xp). For aU j e {1, . . . , J}, introduce: 

R^{e,X) = (l-A,)2||0||2^.)+A^2^2^2A,pen^.^ andi?^(0,A) = il-X,)'\\e\\l^^+X^al 

Then: 

..,,,., < -n%^ , -mtn , .pen, ^fH^H^) 

= (l + 2^\ R^{9,X\ 

since ffi(0, A^) < ffi(0, A*') from the definition of A^. This concludes the proof 
of Theorem 3. 

n 
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